Skip to content

Migrate to new stats falsification rules#8345

Merged
gatesn merged 30 commits into
developfrom
ngates/public-stats-rewrite-rules
Jun 18, 2026
Merged

Migrate to new stats falsification rules#8345
gatesn merged 30 commits into
developfrom
ngates/public-stats-rewrite-rules

Conversation

@gatesn

@gatesn gatesn commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Switch over to using the new rewrite rule registry from the session, instead of using the scalar fn vtable.

@gatesn gatesn added the changelog/chore A trivial change label Jun 10, 2026
AdamGS added a commit that referenced this pull request Jun 11, 2026
## Summary

Adds docs for `StatsRewriteRule` and its functions.  

Can be considered a follow-up for
#8345, but can be individually
merged.

Signed-off-by: Adam Gutglick <adam@spiraldb.com>
@gatesn gatesn force-pushed the ngates/public-stats-rewrite-rules branch from 033c73f to 5ed375f Compare June 11, 2026 15:49
gatesn added 2 commits June 11, 2026 11:51
Port file pruning to session stats rewrites

Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn force-pushed the ngates/public-stats-rewrite-rules branch from 5ed375f to e8dd011 Compare June 11, 2026 15:51
@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 8 regressed benchmarks
✅ 1569 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation take_10k_random 197.9 µs 255.5 µs -22.53%
Simulation take_10k_contiguous 218.5 µs 275.6 µs -20.72%
Simulation patched_take_10k_contiguous_patches 232.2 µs 290.9 µs -20.17%
Simulation patched_take_10k_random 244.2 µs 303.4 µs -19.51%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 178 µs 213.2 µs -16.53%
Simulation chunked_varbinview_opt_into_canonical[(1000, 10)] 193.4 µs 229.3 µs -15.67%
Simulation decompress_rd[f64, (100000, 0.01)] 890.6 µs 1,024.5 µs -13.07%
Simulation decompress_rd[f64, (100000, 0.1)] 890.6 µs 1,024.5 µs -13.07%
Simulation chunked_varbinview_canonical_into[(1000, 10)] 198.5 µs 162.7 µs +21.99%
Simulation varbinview_large 130.4 µs 112.6 µs +15.81%
Simulation chunked_varbinview_canonical_into[(100, 100)] 308.7 µs 273.6 µs +12.81%
Simulation chunked_varbinview_into_canonical[(100, 100)] 367.7 µs 332.5 µs +10.57%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ngates/public-stats-rewrite-rules (3e7c2d6) with develop (98d4a6a)

Open in CodSpeed

gatesn added 3 commits June 11, 2026 14:29
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>

Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn marked this pull request as ready for review June 11, 2026 20:10
@gatesn gatesn requested a review from a team June 11, 2026 20:10
@gatesn gatesn enabled auto-merge (squash) June 11, 2026 20:11
gatesn added 2 commits June 11, 2026 16:13
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn changed the title Make stats rewrite rules public Migrate to new stats falsification rules Jun 11, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Comment thread .github/workflows/ci.yml Outdated
use crate::scalar::Scalar;
use crate::scalar_fn::fns::stat::StatFn;

/// A target that can bind abstract statistics to concrete expressions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to bind a statistic to an expression? What makes statistics abstract?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to bind a statistic to an expression? What makes statistics abstract?

Yeah stats stuff I am intermingling with an expressions change, so I think the "stat" expression becomes explicitly an "Expression::Placeholder", meaning it must be replaced prior to execution.

It's the same logic as our StatCatalog. Except the current StatCatalog means you have to re-run falsification over the entire expression any time your stats come from a different place, e.g. FileStats vs ZoneMap vs ArrayStats.

Here you take the falsified expression, then "bind" the stats from wherever you get them from.

Comment thread vortex-file/src/v2/file_stats_reader.rs Outdated
Comment thread vortex-file/src/v2/file_stats_reader.rs Outdated
Comment thread vortex-array/src/stats/bind.rs
Comment thread vortex-array/src/stats/bind.rs
Comment thread vortex-array/src/stats/bind.rs Outdated
Comment thread vortex-file/src/v2/file_stats_reader.rs Outdated
@@ -115,14 +119,52 @@ impl FileStatsLayoutReader {
Ok(result.as_bool().value() == Some(true))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - unwrap_or_default()

refs
}

fn collect_referenced_stat_field_names(expr: &Expression, refs: &mut HashSet<FieldName>) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have Node implemented for Expression, you should be able to visit the tree using that instead of hand-rolling the recursion

}
}

fn bool_literal(expr: &Expression) -> Option<Option<bool>> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

called once, inline

return Ok(None);
};
let required_stats = filter_required_stats(&lowered, binder.required_stats);
if required_stats.map().is_empty() && !matches!(bool_literal(&lowered), Some(Some(true))) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment explaining this if statement?

available_stats,
required_stats: Relation::new(),
};
let Some(lowered) = bind_stats(predicate, &mut binder)? else {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just stupid - but I think both "lowering" and "bindings" are worth defining and explaining somewhere.

@AdamGS

AdamGS commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

I think I mostly have a lot of questions 😅

gatesn added 2 commits June 16, 2026 15:11
@gatesn gatesn added the action/benchmark Trigger full benchmarks to run on this PR label Jun 17, 2026
@github-actions github-actions Bot removed the action/benchmark Trigger full benchmarks to run on this PR label Jun 17, 2026
@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Polar Signals Profiling Results

Latest Run

Status Commit Job Attempt Link
🟢 Done 22cd9de 1 Explore Profiling Data
Previous Runs (2)
Status Commit Job Attempt Link
🟢 Done 91fc94f 1 Explore Profiling Data
🟢 Done fe12e71 1 Explore Profiling Data

Powered by Polar Signals Cloud

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Benchmarks: PolarSignals Profiling

Vortex (geomean): 1.009x ➖

How to read Verdict and Engines
  • Verdict: Overall PR-level signal after subtracting baseline drift estimated from Parquet control rows. It can be Likely improvement, Likely regression, or No clear signal.
  • Engines: Per-engine attribution. DataFusion is compared against DataFusion/Parquet controls; DuckDB is compared against DuckDB/Parquet controls. This answers whether each engine improved or regressed independently.
  • Confidence: Based on directional consistency, share of rows above the noise floor, and control-run noise.

datafusion / vortex-file-compressed (1.009x ➖, 0↑ 2↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
polarsignals_q00/datafusion:vortex-file-compressed 🚨 118941762 102768567 1.16
polarsignals_q01/datafusion:vortex-file-compressed 256448926 279989372 0.92
polarsignals_q02/datafusion:vortex-file-compressed 23214759 24125205 0.96
polarsignals_q03/datafusion:vortex-file-compressed 265061202 256987491 1.03
polarsignals_q04/datafusion:vortex-file-compressed 9833143 9817989 1.00
polarsignals_q05/datafusion:vortex-file-compressed 14880512 14890139 1.00
polarsignals_q06/datafusion:vortex-file-compressed 21152744 20518814 1.03
polarsignals_q07/datafusion:vortex-file-compressed 🚨 15090756 13181202 1.14
polarsignals_q08/datafusion:vortex-file-compressed 384875010 400734886 0.96
polarsignals_q09/datafusion:vortex-file-compressed 11315723 12360100 0.92

No file size changes detected.

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Benchmarks: TPC-H SF=1 on NVME

Verdict: No clear signal (low confidence)
Attributed Vortex impact: +0.4%
Engines: DataFusion No clear signal (-0.5%, environment too noisy confidence) · DuckDB No clear signal (+1.3%, environment too noisy confidence)
Vortex (geomean): 0.947x ➖
Parquet (geomean): 0.945x ➖
Shifts: Parquet (control) -5.5% · Median polish -5.1%

How to read Verdict and Engines
  • Verdict: Overall PR-level signal after subtracting baseline drift estimated from Parquet control rows. It can be Likely improvement, Likely regression, or No clear signal.
  • Engines: Per-engine attribution. DataFusion is compared against DataFusion/Parquet controls; DuckDB is compared against DuckDB/Parquet controls. This answers whether each engine improved or regressed independently.
  • Confidence: Based on directional consistency, share of rows above the noise floor, and control-run noise.

datafusion / vortex-file-compressed (0.943x ➖, 2↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/datafusion:vortex-file-compressed 51995100 54011867 0.96
tpch_q02/datafusion:vortex-file-compressed 22680714 24601044 0.92
tpch_q03/datafusion:vortex-file-compressed 🚀 30313032 33948168 0.89
tpch_q04/datafusion:vortex-file-compressed 19910967 20274586 0.98
tpch_q05/datafusion:vortex-file-compressed 46224112 50316208 0.92
tpch_q06/datafusion:vortex-file-compressed 10066504 9625637 1.05
tpch_q07/datafusion:vortex-file-compressed 50986978 55715522 0.92
tpch_q08/datafusion:vortex-file-compressed 38725230 41721086 0.93
tpch_q09/datafusion:vortex-file-compressed 50697066 54581620 0.93
tpch_q10/datafusion:vortex-file-compressed 32275913 33889127 0.95
tpch_q11/datafusion:vortex-file-compressed 16509839 17309504 0.95
tpch_q12/datafusion:vortex-file-compressed 22846844 23913692 0.96
tpch_q13/datafusion:vortex-file-compressed 26445874 28596805 0.92
tpch_q14/datafusion:vortex-file-compressed 14607862 14810463 0.99
tpch_q15/datafusion:vortex-file-compressed 22127875 23757407 0.93
tpch_q16/datafusion:vortex-file-compressed 19619395 20364042 0.96
tpch_q17/datafusion:vortex-file-compressed 65870449 70255315 0.94
tpch_q18/datafusion:vortex-file-compressed 🚀 74437954 82725835 0.90
tpch_q19/datafusion:vortex-file-compressed 18116428 18435209 0.98
tpch_q20/datafusion:vortex-file-compressed 29399980 30647694 0.96
tpch_q21/datafusion:vortex-file-compressed 69078266 75857688 0.91
tpch_q22/datafusion:vortex-file-compressed 11789659 12875931 0.92
datafusion / vortex-compact (0.938x ➖, 1↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/datafusion:vortex-compact 57424214 62801576 0.91
tpch_q02/datafusion:vortex-compact 25519439 27534724 0.93
tpch_q03/datafusion:vortex-compact 32356072 34478135 0.94
tpch_q04/datafusion:vortex-compact 21878301 23332572 0.94
tpch_q05/datafusion:vortex-compact 47910082 51118953 0.94
tpch_q06/datafusion:vortex-compact 12426153 13178809 0.94
tpch_q07/datafusion:vortex-compact 🚀 54239647 60336769 0.90
tpch_q08/datafusion:vortex-compact 42313130 43771845 0.97
tpch_q09/datafusion:vortex-compact 53924087 59180273 0.91
tpch_q10/datafusion:vortex-compact 35176206 39050263 0.90
tpch_q11/datafusion:vortex-compact 17281350 18393308 0.94
tpch_q12/datafusion:vortex-compact 29856349 32884972 0.91
tpch_q13/datafusion:vortex-compact 32745420 33253667 0.98
tpch_q14/datafusion:vortex-compact 18233276 19064933 0.96
tpch_q15/datafusion:vortex-compact 29775733 31805252 0.94
tpch_q16/datafusion:vortex-compact 24780156 25144649 0.99
tpch_q17/datafusion:vortex-compact 69190542 73567040 0.94
tpch_q18/datafusion:vortex-compact 76879645 81705820 0.94
tpch_q19/datafusion:vortex-compact 39063676 41096601 0.95
tpch_q20/datafusion:vortex-compact 34792518 36886052 0.94
tpch_q21/datafusion:vortex-compact 75827850 81801646 0.93
tpch_q22/datafusion:vortex-compact 13405478 14079383 0.95
datafusion / parquet (0.945x ➖, 2↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/datafusion:parquet 🚀 97795461 124305003 0.79
tpch_q02/datafusion:parquet 60510547 64021802 0.95
tpch_q03/datafusion:parquet 82458659 86167114 0.96
tpch_q04/datafusion:parquet 43810237 47821479 0.92
tpch_q05/datafusion:parquet 94237023 100351081 0.94
tpch_q06/datafusion:parquet 42379963 43848913 0.97
tpch_q07/datafusion:parquet 🚀 94650764 108109705 0.88
tpch_q08/datafusion:parquet 95378768 100468059 0.95
tpch_q09/datafusion:parquet 117267835 128678230 0.91
tpch_q10/datafusion:parquet 115412435 122614973 0.94
tpch_q11/datafusion:parquet 39668194 43418167 0.91
tpch_q12/datafusion:parquet 85840781 78093290 1.10
tpch_q13/datafusion:parquet 195254080 206327333 0.95
tpch_q14/datafusion:parquet 45340632 45668051 0.99
tpch_q15/datafusion:parquet 59257387 64101302 0.92
tpch_q16/datafusion:parquet 42551680 44356790 0.96
tpch_q17/datafusion:parquet 138109727 140391443 0.98
tpch_q18/datafusion:parquet 157626673 162158719 0.97
tpch_q19/datafusion:parquet 80561943 81840816 0.98
tpch_q20/datafusion:parquet 72793434 72882381 1.00
tpch_q21/datafusion:parquet 138523194 151122824 0.92
tpch_q22/datafusion:parquet 43590312 46515294 0.94
datafusion / arrow (0.938x ➖, 3↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/datafusion:arrow 60814367 65656560 0.93
tpch_q02/datafusion:arrow 17004299 18061945 0.94
tpch_q03/datafusion:arrow 32643310 36178924 0.90
tpch_q04/datafusion:arrow 26446107 28226627 0.94
tpch_q05/datafusion:arrow 61602788 61332258 1.00
tpch_q06/datafusion:arrow 24428895 22368340 1.09
tpch_q07/datafusion:arrow 104764276 110222479 0.95
tpch_q08/datafusion:arrow 🚀 41311133 46204032 0.89
tpch_q09/datafusion:arrow 63979249 70900848 0.90
tpch_q10/datafusion:arrow 🚀 47508503 55173668 0.86
tpch_q11/datafusion:arrow 9016024 9611057 0.94
tpch_q12/datafusion:arrow 50018798 53320773 0.94
tpch_q13/datafusion:arrow 44952204 48462618 0.93
tpch_q14/datafusion:arrow 23307300 24788931 0.94
tpch_q15/datafusion:arrow 46408974 49395970 0.94
tpch_q16/datafusion:arrow 16324746 16669317 0.98
tpch_q17/datafusion:arrow 65194275 70558807 0.92
tpch_q18/datafusion:arrow 109965974 118882346 0.92
tpch_q19/datafusion:arrow 37673155 39699101 0.95
tpch_q20/datafusion:arrow 36174340 38963889 0.93
tpch_q21/datafusion:arrow 156255264 163792299 0.95
tpch_q22/datafusion:arrow 🚀 11914252 13238071 0.90
duckdb / vortex-file-compressed (0.954x ➖, 0↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/duckdb:vortex-file-compressed 31230583 31678237 0.99
tpch_q02/duckdb:vortex-file-compressed 25553264 26821615 0.95
tpch_q03/duckdb:vortex-file-compressed 33163649 33618142 0.99
tpch_q04/duckdb:vortex-file-compressed 28375952 29913254 0.95
tpch_q05/duckdb:vortex-file-compressed 37270292 39670034 0.94
tpch_q06/duckdb:vortex-file-compressed 7931866 8415787 0.94
tpch_q07/duckdb:vortex-file-compressed 35456125 37921159 0.93
tpch_q08/duckdb:vortex-file-compressed 40825925 38955873 1.05
tpch_q09/duckdb:vortex-file-compressed 57977121 63774804 0.91
tpch_q10/duckdb:vortex-file-compressed 41506765 43378791 0.96
tpch_q11/duckdb:vortex-file-compressed 14955467 15304165 0.98
tpch_q12/duckdb:vortex-file-compressed 23417394 24134894 0.97
tpch_q13/duckdb:vortex-file-compressed 40791382 45121119 0.90
tpch_q14/duckdb:vortex-file-compressed 21676077 22691189 0.96
tpch_q15/duckdb:vortex-file-compressed 17326006 17146917 1.01
tpch_q16/duckdb:vortex-file-compressed 29499550 30973158 0.95
tpch_q17/duckdb:vortex-file-compressed 23988695 25102327 0.96
tpch_q18/duckdb:vortex-file-compressed 53301531 56504072 0.94
tpch_q19/duckdb:vortex-file-compressed 28878549 30954163 0.93
tpch_q20/duckdb:vortex-file-compressed 33396605 34827918 0.96
tpch_q21/duckdb:vortex-file-compressed 101445242 107860797 0.94
tpch_q22/duckdb:vortex-file-compressed 17294021 19184630 0.90
duckdb / vortex-compact (0.954x ➖, 1↑ 1↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/duckdb:vortex-compact 38209863 40235109 0.95
tpch_q02/duckdb:vortex-compact 33060243 35453598 0.93
tpch_q03/duckdb:vortex-compact 34156075 36347043 0.94
tpch_q04/duckdb:vortex-compact 30734713 31756945 0.97
tpch_q05/duckdb:vortex-compact 40669893 43818481 0.93
tpch_q06/duckdb:vortex-compact 🚀 10340206 11514486 0.90
tpch_q07/duckdb:vortex-compact 41103688 42987128 0.96
tpch_q08/duckdb:vortex-compact 🚨 47978191 42716371 1.12
tpch_q09/duckdb:vortex-compact 66091998 70650616 0.94
tpch_q10/duckdb:vortex-compact 45692465 48689358 0.94
tpch_q11/duckdb:vortex-compact 18796463 19052806 0.99
tpch_q12/duckdb:vortex-compact 29849358 30742131 0.97
tpch_q13/duckdb:vortex-compact 46797672 49839179 0.94
tpch_q14/duckdb:vortex-compact 25958215 27433294 0.95
tpch_q15/duckdb:vortex-compact 19995673 21095207 0.95
tpch_q16/duckdb:vortex-compact 32297579 34602798 0.93
tpch_q17/duckdb:vortex-compact 28486278 30431382 0.94
tpch_q18/duckdb:vortex-compact 54172445 55095612 0.98
tpch_q19/duckdb:vortex-compact 32953106 35463035 0.93
tpch_q20/duckdb:vortex-compact 39958517 41271722 0.97
tpch_q21/duckdb:vortex-compact 105217347 110757360 0.95
tpch_q22/duckdb:vortex-compact 18619206 19863756 0.94
duckdb / parquet (0.945x ➖, 4↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/duckdb:parquet 76024536 77092522 0.99
tpch_q02/duckdb:parquet 39087971 40896118 0.96
tpch_q03/duckdb:parquet 71837624 72644942 0.99
tpch_q04/duckdb:parquet 49242878 50688935 0.97
tpch_q05/duckdb:parquet 68265500 70065133 0.97
tpch_q06/duckdb:parquet 22315943 22776522 0.98
tpch_q07/duckdb:parquet 🚀 69322665 81475354 0.85
tpch_q08/duckdb:parquet 82549881 85433846 0.97
tpch_q09/duckdb:parquet 🚀 133955033 158411086 0.85
tpch_q10/duckdb:parquet 125908946 134530489 0.94
tpch_q11/duckdb:parquet 22327249 23175996 0.96
tpch_q12/duckdb:parquet 🚀 47038823 54175875 0.87
tpch_q13/duckdb:parquet 251608583 271657894 0.93
tpch_q14/duckdb:parquet 51926308 52356879 0.99
tpch_q15/duckdb:parquet 25914000 26531701 0.98
tpch_q16/duckdb:parquet 58034028 60542085 0.96
tpch_q17/duckdb:parquet 60024603 57312023 1.05
tpch_q18/duckdb:parquet 119183930 120660293 0.99
tpch_q19/duckdb:parquet 🚀 68786229 90920240 0.76
tpch_q20/duckdb:parquet 65787974 66791453 0.98
tpch_q21/duckdb:parquet 177181516 189299110 0.94
tpch_q22/duckdb:parquet 53766991 54599175 0.98
duckdb / duckdb (0.963x ➖, 0↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
tpch_q01/duckdb:duckdb 17651424 18433945 0.96
tpch_q02/duckdb:duckdb 15241241 15197621 1.00
tpch_q03/duckdb:duckdb 22559396 23739741 0.95
tpch_q04/duckdb:duckdb 20955693 21885894 0.96
tpch_q05/duckdb:duckdb 23362095 23871584 0.98
tpch_q06/duckdb:duckdb 7009401 7369167 0.95
tpch_q07/duckdb:duckdb 25385975 26975741 0.94
tpch_q08/duckdb:duckdb 24100713 25345273 0.95
tpch_q09/duckdb:duckdb 58364025 62421473 0.93
tpch_q10/duckdb:duckdb 52038732 52639292 0.99
tpch_q11/duckdb:duckdb 7302644 7797110 0.94
tpch_q12/duckdb:duckdb 17841840 18280955 0.98
tpch_q13/duckdb:duckdb 40571549 40825925 0.99
tpch_q14/duckdb:duckdb 21528425 22473305 0.96
tpch_q15/duckdb:duckdb 13522142 14358787 0.94
tpch_q16/duckdb:duckdb 26294220 27894539 0.94
tpch_q17/duckdb:duckdb 16129514 16481481 0.98
tpch_q18/duckdb:duckdb 41309513 43078695 0.96
tpch_q19/duckdb:duckdb 31858453 32761948 0.97
tpch_q20/duckdb:duckdb 25692345 25694417 1.00
tpch_q21/duckdb:duckdb 62119492 64543664 0.96
tpch_q22/duckdb:duckdb 25448514 26321614 0.97

File Size Changes (10 files changed, +2.2% overall, 9↑ 1↓)
File Scale Format Base HEAD Change %
orders_0.vortex 1.0 vortex-file-compressed 35.24 MB 38.60 MB +3.37 MB +9.6%
lineitem_0.vortex 1.0 vortex-file-compressed 82.19 MB 84.99 MB +2.80 MB +3.4%
lineitem_1.vortex 1.0 vortex-file-compressed 82.17 MB 84.61 MB +2.44 MB +3.0%
orders_0.vortex 1.0 vortex-compact 31.73 MB 32.20 MB +482.27 KB +1.5%
part_0.vortex 1.0 vortex-compact 3.39 MB 3.43 MB +42.19 KB +1.2%
supplier_0.vortex 1.0 vortex-file-compressed 601.19 KB 608.12 KB +6.94 KB +1.2%
partsupp_0.vortex 1.0 vortex-compact 20.93 MB 21.10 MB +172.44 KB +0.8%
part_0.vortex 1.0 vortex-file-compressed 4.98 MB 5.01 MB +38.17 KB +0.7%
partsupp_0.vortex 1.0 vortex-file-compressed 23.69 MB 23.78 MB +91.05 KB +0.4%
customer_0.vortex 1.0 vortex-file-compressed 8.91 MB 8.89 MB 19.10 KB -0.2%

Totals:

  • vortex-compact: 190.27 MB → 190.95 MB (+0.4%)
  • vortex-file-compressed: 238.04 MB → 246.76 MB (+3.7%)

@github-actions

github-actions Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Benchmarks: FineWeb NVMe

Verdict: No clear signal (low confidence)
Attributed Vortex impact: -1.1%
Engines: DataFusion No clear signal (-1.3%, low confidence) · DuckDB No clear signal (-0.9%, low confidence)
Vortex (geomean): 0.997x ➖
Parquet (geomean): 1.008x ➖
Shifts: Parquet (control) +0.8% · Median polish +0.5%

How to read Verdict and Engines
  • Verdict: Overall PR-level signal after subtracting baseline drift estimated from Parquet control rows. It can be Likely improvement, Likely regression, or No clear signal.
  • Engines: Per-engine attribution. DataFusion is compared against DataFusion/Parquet controls; DuckDB is compared against DuckDB/Parquet controls. This answers whether each engine improved or regressed independently.
  • Confidence: Based on directional consistency, share of rows above the noise floor, and control-run noise.

datafusion / vortex-file-compressed (1.000x ➖, 0↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
fineweb_q00/datafusion:vortex-file-compressed 5319594 5113443 1.04
fineweb_q01/datafusion:vortex-file-compressed 32926687 32716324 1.01
fineweb_q02/datafusion:vortex-file-compressed 36944257 39746691 0.93
fineweb_q03/datafusion:vortex-file-compressed 64873600 64866181 1.00
fineweb_q04/datafusion:vortex-file-compressed 281244873 277276416 1.01
fineweb_q05/datafusion:vortex-file-compressed 220410555 221775928 0.99
fineweb_q06/datafusion:vortex-file-compressed 50660343 50031057 1.01
fineweb_q07/datafusion:vortex-file-compressed 53311896 55144412 0.97
fineweb_q08/datafusion:vortex-file-compressed 23767522 22816656 1.04
datafusion / vortex-compact (1.000x ➖, 1↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
fineweb_q00/datafusion:vortex-compact 5992404 5978045 1.00
fineweb_q01/datafusion:vortex-compact 97922666 94814832 1.03
fineweb_q02/datafusion:vortex-compact 115234763 105320532 1.09
fineweb_q03/datafusion:vortex-compact 869177865 880740581 0.99
fineweb_q04/datafusion:vortex-compact 915826054 930052871 0.98
fineweb_q05/datafusion:vortex-compact 829794541 813032077 1.02
fineweb_q06/datafusion:vortex-compact 467936252 473354363 0.99
fineweb_q07/datafusion:vortex-compact 485049231 479697555 1.01
fineweb_q08/datafusion:vortex-compact 🚀 21208064 23802293 0.89
datafusion / parquet (1.013x ➖, 0↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
fineweb_q00/datafusion:parquet 6131422 6645407 0.92
fineweb_q01/datafusion:parquet 299884837 297715662 1.01
fineweb_q02/datafusion:parquet 306421902 301057250 1.02
fineweb_q03/datafusion:parquet 305418978 280049240 1.09
fineweb_q04/datafusion:parquet 302980492 307414786 0.99
fineweb_q05/datafusion:parquet 306508976 306095885 1.00
fineweb_q06/datafusion:parquet 308764141 290377754 1.06
fineweb_q07/datafusion:parquet 286378392 281783589 1.02
fineweb_q08/datafusion:parquet 285209709 278235737 1.03
duckdb / vortex-file-compressed (0.946x ➖, 1↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
fineweb_q00/duckdb:vortex-file-compressed 3546943 3386565 1.05
fineweb_q01/duckdb:vortex-file-compressed 33091892 34872046 0.95
fineweb_q02/duckdb:vortex-file-compressed 37965990 40277077 0.94
fineweb_q03/duckdb:vortex-file-compressed 🚀 117700388 157248266 0.75
fineweb_q04/duckdb:vortex-file-compressed 277171255 278708279 0.99
fineweb_q05/duckdb:vortex-file-compressed 214713492 225306212 0.95
fineweb_q06/duckdb:vortex-file-compressed 52694411 50968486 1.03
fineweb_q07/duckdb:vortex-file-compressed 54429627 56648705 0.96
fineweb_q08/duckdb:vortex-file-compressed 20948472 22786678 0.92
duckdb / vortex-compact (1.042x ➖, 0↑ 2↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
fineweb_q00/duckdb:vortex-compact 4313215 4514033 0.96
fineweb_q01/duckdb:vortex-compact 🚨 113934673 96620270 1.18
fineweb_q02/duckdb:vortex-compact 114680801 116462206 0.98
fineweb_q03/duckdb:vortex-compact 883278754 859889929 1.03
fineweb_q04/duckdb:vortex-compact 914419389 901669895 1.01
fineweb_q05/duckdb:vortex-compact 814712161 805566974 1.01
fineweb_q06/duckdb:vortex-compact 466503655 459236624 1.02
fineweb_q07/duckdb:vortex-compact 528575105 482400469 1.10
fineweb_q08/duckdb:vortex-compact 🚨 22085437 19778627 1.12
duckdb / parquet (1.002x ➖, 0↑ 0↓)
name PR 22cd9de (ns) base 48c33e8 (ns) ratio (PR/base)
fineweb_q00/duckdb:parquet 30055615 32568662 0.92
fineweb_q01/duckdb:parquet 87156671 84683803 1.03
fineweb_q02/duckdb:parquet 85415137 85608332 1.00
fineweb_q03/duckdb:parquet 317422238 318330464 1.00
fineweb_q04/duckdb:parquet 448341676 445922508 1.01
fineweb_q05/duckdb:parquet 420763340 417392711 1.01
fineweb_q06/duckdb:parquet 206529269 204719688 1.01
fineweb_q07/duckdb:parquet 218946768 218447287 1.00
fineweb_q08/duckdb:parquet 35995794 34237420 1.05

File Size Changes (2 files changed, -0.1% overall, 0↑ 2↓)
File Scale Format Base HEAD Change %
sample.vortex 1.0 vortex-compact 1.23 GB 1.23 GB 4.02 KB -0.0%
sample.vortex 1.0 vortex-file-compressed 1.43 GB 1.43 GB 1.91 MB -0.1%

Totals:

  • vortex-compact: 1.23 GB → 1.23 GB (-0.0%)
  • vortex-file-compressed: 1.43 GB → 1.43 GB (-0.1%)

Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jun 17, 2026
@github-actions github-actions Bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jun 17, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jun 17, 2026
@gatesn gatesn disabled auto-merge June 17, 2026 19:59
@github-actions github-actions Bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Jun 17, 2026
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
@@ -0,0 +1,199 @@
// SPDX-License-Identifier: Apache-2.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated

@AdamGS

AdamGS commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

I think a few commits from other branches made it here by mistake

@joseph-isaacs joseph-isaacs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. I trust the other matches can we fuzz before merging.

Also needs to comments


Vortex uses statistics to prove when a filter cannot match a row group, zone, or
file. The proof expression returns `true` when the input can be skipped. It
returns `false` or `null` when pruning is not proven.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why both?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its just easier to map nulls to nulls

#[derive(Clone)]
pub struct Binary;

fn simplify_and(lhs: &Expression, rhs: &Expression) -> Option<Expression> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this

Comment thread vortex-array/src/scalar_fn/fns/binary/mod.rs Outdated
Comment thread vortex-array/src/scalar_fn/fns/binary/mod.rs Outdated

@joseph-isaacs joseph-isaacs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
@@ -0,0 +1,199 @@
// SPDX-License-Identifier: Apache-2.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rebase seems wrong?

`true`, the file stats reader can return an all-false pruning mask without
reading child layouts.

Scan planning uses `checked_pruning_expr` to lower a falsified expression against

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is gone?

Comment thread vortex-array/src/stats/bind.rs Outdated
Comment thread vortex-duckdb/src/projection.rs Outdated
gatesn added 4 commits June 18, 2026 12:37
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
Signed-off-by: "Nicholas Gates" <nick@nickgates.com>
@gatesn gatesn enabled auto-merge (squash) June 18, 2026 17:44
@gatesn gatesn merged commit aef6307 into develop Jun 18, 2026
92 of 95 checks passed
@gatesn gatesn deleted the ngates/public-stats-rewrite-rules branch June 18, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/chore A trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants